gtaptools

Published

May 22, 2023

Introduction

Welcome to the tutorial for the gtaptools package in R, which is currently under development. The primary objective of this material is to provide practical examples of using the gtaptools package for researchers during the development of research with Computable General Equilibrium (CGE) models.

The package aims to improve file management, increase the productivity, and allow do build a scriptable pipeline that promote the reproducibility. Also, the package provides tools to graphical visualizations that increase the analytical potential of the database and results. From a broader perspective, this package is part of a long-term agenda that involves the development of tools to bring the functionality and flexibility of the R language to CGE modeling, such as the HARr and TabloToR projects1.

The gtaptools package is designed to be user-friendly and is accompanied by detailed documentation and example code built-in that can be explored in Rstudio. We hope that this manual will serve as an auxiliary practical support to users and welcome community feedback and contributions.

Installation

To use the gtaptools package, it’s necessary to have R installed on your computer, which can be downloaded from here. Additionally, we recommend downloading RStudio, available at here, which provides a user-friendly interface to work with R.

You can install the development version of gtaptools from GitHub with:

# If the devtools package is not already installed, please run the disabled line below.
# install.packages("devtools")
devtools::install_github("tsimonato/gtaptools")

Compatibility

The tools of the package mainly works with files in .har, .sl4 and other formats generated by the GEMPACK, which is widely used in CGE modeling2. As a result, some of the package’s functionalities require a certain degree of familiarity with the structure of the data used in the GEMPACK suite3. This knowledge is not necessary for users to take full advantage of the package’s capabilities in manipulating and visualizing CGE model data, but increase the learning curve.

The package also includes functions for data cleaning, manipulation, and visualization that can be used with other data format, like R data.frame and array. Also, although GEMPACK is only compatible with Windows OS, the package’s features work on other OS supported by R and Rstudio such as Linux and MacOS.

Tools

This section of the manual provides a comprehensive overview of the package’s tools organized by their individual functionalities. The functions are categorized into four topics: Tools, Data and file management, Data viz (Static and Reactive), and Report automation. Each topic is further divided into subtopics that describe the specific functions and their intended use cases. Understanding the package’s functions can help users streamline their data analysis workflows and create effective visualizations and reports.

Data/file managment

In this section, the tools that manipulate the databases are displayed. Its structure has a simple design, and the syntax adopted for the custom functions has a familiar format for GEMPACK users.

- gtaptools::har_shape

The har_shape is a efficient and valuable tool for binding databases and modifying headers in arrays or data.frames format. This function allows users combine various databases quickly and easily while generating new variables using custom calculations. This function is especially useful for integrating .har databases with other bases in the R environment, providing a versatile solution for analysts who need to work with different classes of datasets.

What this tool does:
  • Read and combines .har files, arrays and data.frames.
  • Create/change headers from calculations.
  • Write headers to disk.

In general, the execution of the function follows three steps:

  1. Create a list of the datasets that must be combined (input_data).
  2. Create a list with the calculations that will be executed (new_calculated_vars).
  3. Execute the function defining if any headers should be deleted (del_headers), and if and where the sets (export_sets) and the numerical database (output_har_file) should be saved.

Let’s assume we want the following:

path_to_har <- gtaptools::templates("oranig_example.har")

1 input_data <- list(
    path_to_har, # Path to .har database 
    list(
      input_data = gtaptools::example_df, # Data.frame 
      # Description of tha header that will be created by data.frame:
      header = quote(`1MAR`[c("COM", "SRC", "IND", "MAR")])
    )
 )

output_har_r <- 
  gtaptools::har_shape(
    input_data = input_data,
2    new_calculated_vars = NULL,
3    output_har_file = "gtaptools_shape_example1.har"
  )
1
Combine a database in .har format on disk in the path "path_to_har" with an new header 1MAR[c("COM", "SRC", "IND", "MAR")] created from data.frame gtaptools::example_df.
2
No calculation is done to generate/change headers.
3
Writes the output to "gtaptools_shape_example1.har" and returns it the list object output_har_r to R environment.

Header names that start with digits, like 1MAR, must be enclosed in `` to be properly recognized in R.

The arrays and data.frames used as input (like gtaptools::example_df in (1)) must contain columns with names that correspond to the mentioned sets and, in the case of data.frames, the column with the numerical values in the data.frame must have the same name of the header that will be generated (1MAR in this example).

Therefore, to generate 1MAR[c("COM", "SRC", "MAR")] the data.frame used has columns (categorical) that correspond to the sets and a column (numerical) that correspond to name of the header. Take a look at the first 20 lines of this data.frame below.

DT::datatable(gtaptools::example_df[1:20,])

Now let’s assume another scenario.

path_to_har <- gtaptools::templates("oranig_example.har")

1 input_data <- list(
    path_to_har # Path to .har database
 )

2new_calculated_vars <- list(
   quote(MARC["COM"] := `1MAR`), # Sum 1MAR to set COM
   quote(MACM[c("COM")] := apply(`1MAR`, c("COM"), mean, na.rm = T)),
   quote(MULT[c("COM", "IND")] := solve(MAKE)), # Solve the MAKE matrix
   quote(NSET := c("Comm1", "Comm2")) # Create sets
)

output_har_r <- 
  gtaptools::har_shape(
    input_data = input_data,
    new_calculated_vars = new_calculated_vars, 
3    del_headers = c("1LND"),
    export_sets = "gtaptools_shape_example2_sets.har", 
    output_har_file = "gtaptools_shape_example2.har"
)
1
Reads a .har from disk as input.
2
Generates the MARC and MACM header, the aggregation by sum (default) and by mean to COM of 1MAR. Generates MULT, with "COM" and "IND" sets, which is the inverse matrix of the MAKE header. Creates a new set header NSET consisting of “Comm1” and “Comm2” elements.
3
Deletes the 1LND header. Saves sets and numeric headers in different files.

MACM is the aggregation by mean of 1MAR. So, to aggregate by a function other than sum, the section apply(1MAR, c("COM"), mean)) was applied, which can be interpreted as an aggregation of 1MAR to "COM" applying the mean function. Note that there is still the na.rm=T argument which would not be necessary, it is just to show how other parameters of the aggregation function (mean in this case) could be specified.

Note the syntax adopted in the formulas. It is indispensable to properly use := (and not = ), [ ], c("Set1", "Set2", ...) in case of more than one set, and encapsulating the formula inside quote( ).

The aggregation performed on MARC["COM"] := 1MAR is defined by the set indicated for the output header. Since the 1MAR header is composed of 4 sets ("COM", "SRC", "IND", "MAR") and the output is composed of only 1 ("COM"), the tool automatically aggregates the output to COM by sum.

Calculations are being done between arrays in this tool, and R offers a vast range of possibilities for manipulating arrays. It can be applied, for example, if statements and intermediate aggregations of sets as commonly adopted in GEMPACK scripts.

path_to_har <- gtaptools::templates("oranig_example.har")

1 input_data <- list(
    path_to_har # Path to .har database
 )

new_calculated_vars <- list( 
2   quote(SHMA[c("COM", "IND")] := MAKE / apply(MAKE, c("COM"), sum)),
3   quote(SH3B[c("COM", "SRC")] := `3BAS` / ifelse(apply(`3BAS`, c("COM"), sum)==0, 1, apply(`3BAS`, c("COM"), sum))),
4   quote(IF3B[c("COM", "SRC")] := `3BAS` + ifelse(`SH3B` > 0.95, `5TAX`, 0))
)

output_har <- 
  gtaptools::har_shape(
    input_data = input_data,
    new_calculated_vars = new_calculated_vars, 
    del_headers = NULL,
5    export_sets = F,
    output_har_file = "gtaptools_shape_example3.har"
)
1
Reads a .har from disk as input.
2
Creates SHMA which is the share of MAKE by "COM".
3
Creates SH3B which is the share of 3BAS in "COM" with conditional for division by zero.
4
Creates IF3B which is 3BAS added to 5TAX if SH3B > 0.95.
5
Saves the output .har file and does not include the sets in it. ("export_sets = F").

To apply conditions on division by zero we can adopt ifelsestatements. In (2), in the case of division by zero (apply(3BAS, c("COM"), sum) == 0) the value 1 is adopted in the denominator, and apply(3BAS, c("COM"), sum) otherwise.

It is also possible to adopt cross-references, as in (4), where 3BAS is added to 5TAX only if SH3B > 0.95.

Note that the SH3B header was created and used as an input in the following formula. It is possible due to the sequential way the calculations are processed in the tool. Therefore, it is necessary to follow this sequence: create the header > use it as input.

Check the the data headers created that are being written to "gtaptools_shape_example3.har":

DT::datatable(as.data.frame.table(output_har$SHMA))
DT::datatable(as.data.frame.table(output_har$SH3B))
DT::datatable(as.data.frame.table(output_har$IF3B))

Arrays converted to data.frame through as.data.frame.table() function have “Freq” as the name of the column of numerical values. Keep this in mind when using this type of conversion.

Another practical example is the calculation of shocks relating the .har base to external data for use as shock input file in RunDynam. Let’s assume we have information on an increase in household consumption in monetary terms of 70 billion for H01 households and 50 billion for H02 households. Let`s build a data.frame to fit this economic policy:

path_to_har <- gtaptools::templates("oranig_example.har")

input_har <- 
  gtaptools::har_shape(
1    input_data = path_to_har
  )

set_hou <- input_har$HOU
set_years <- paste0("Y", 2015:2018)
2POL <-  expand.grid(HOU = set_hou,
                    YEAR = set_years)

POL$POL <- 0
3POL[POL$YEAR == "Y2015", "POL"] <- c(70e9, 50e9, 0, 0, 0, 0, 0, 0, 0, 0)

DT::datatable(POL)
1
Reads the .har database.
2
Creates a data.frame with the YEAR and HOU sets.
3
Fills the data.frame created with values in monetary units. Check the POL produced:

The POL data.frame stores the policy values in monetary units. We can then calculate the share between the values in POL and the household consumption represented by 3PUR[c("COM", "HOU")] in the base "oranig_example.har".

1input_data <- list(
  list(
    input_data = POL, # Data.frame
    header = quote(POL[c("YEAR", "HOU")])
  ),
  list(
    input_data = input_har$`3PUR`, # Array
    header = quote(`3PUR`[c("COM", "HOU")])
  )
)

new_calculated_vars <- list(
2  quote(SHOC[c("YEAR", "HOU")] := 100*(POL/(apply(`3PUR`, "HOU", sum)*1e6)))
)

output_har <-
3  gtaptools::har_shape(
    input_data = input_data,
    new_calculated_vars = new_calculated_vars,
    output_har_file = "gtaptools_shape_example_shock.har"
  )
1
Creates a list composed by the data.frame POL and 3PUR of the original .har base.
2
Defines the calculation of the shock. Note that 3PUR is being “pre-aggregated” to "HOU" in apply(3PUR, "HOU", sum), and is being multiplied by 1e6 due to the monetary unit adopted in the Brazilian Input-Output Matrix used to calibrate this CGE database.
3
Run the calculations and write the file "gtaptools_shape_example_shock.har". Check the data headers on this output file:
DT::datatable(as.data.frame.table(output_har$SHOC))
DT::datatable(as.data.frame.table(output_har$`3PUR`))
DT::datatable(as.data.frame.table(output_har$POL))

- gtaptools::agg_har

Data viz

Spatial data

- gtaptools::plot_map

The plot_map tool creates static and reactive maps with ggplot2 and leaflet packages. It requires an input_data data.frame with at least one numeric column and one categorical column with region IDs such as iso_a2, iso_a3, or iso_n3. The value_var parameter specifies the numeric variable to be plotted on the map. The region_var parameter is an optional variable that contains region labels used to aggregate the sf. The colors parameter specifies the color palette or custom color break vector to be used. Other parameters include borders_color and borders_size, to customize the color and size of the borders line. The legend_title parameter and the legend_pos parameter specify the legend’s title and position. legend_labels provides the option to replace numeric color scale labels with custom numeric or character labels. The reactive parameter should be F for static maps. fillOpacity parameter sets the color fill layer transparency. The tool supports various color palettes, such as Viridis, Color Brewer Sequential, and Color Brewer Diverging, in addition to creating a custom color palette. Check the manual for more details.

What this tool does:
  • Plot static and reactive global maps from data in data.frames.

In this way, we can summarize the implementation of this tool in three steps:

A. Create a data.frame (input_data) that relates each ISO country code to the column with the numerical values (value_var) that will be plotted.
B. Customize the legend elements if needed.
C. Customize color elements if needed.

The data.frame indicated in input_data, and the spatial object sf are related using the iso_a2, iso_a3, and iso_n3 match vectors. Therefore, input_data must contain at least one column with the name and content consistent with these ISO country codes to match correctly. Although the user will likely provide the region aggregation and labeling variable (region_var) via input_data, some vectors are also available built-in.

DT::datatable(gtaptools::template_map[c("iso_a2", "iso_a3", "iso_n3")])
sf <- rnaturalearth::ne_countries(scale = "small", returnclass = "sf")
sf <- sf::st_drop_geometry(sf)
DT::datatable(sf[1:20,])

Colors are not plotted for regions that value_var are not numerical or are NA.

Check out some usage examples:

gtaptools::plot_map(
1  input_data = gtaptools::template_map,
2  value_var = "gdp_pc",
3  region_var = "name",
4  colors = "viridis",
5  legend_title = "GDP per capita 2021, PPP</br>(constant 2017 international $)"
)
1
A data.frame with at least 1 column to match “iso_a2”, “iso_a3” or “iso_n3”.
2
Numerical variable to be plotted.
3
Region variable for spatial aggregation and labeling.
4
Color palette.
5
Note that in the case of an interactive map, we can use HTML commands in legend title like </br> to break the line.
gtaptools::plot_map(
  input_data = gtaptools::template_map, 
  value_var = "gdp_pc", 
  region_var = "name", 
  colors = "viridis", 
  legend_title = "GDP pc 2021\n(con. 2017 int. $)",
  reactive = F 
)

Now let’s apply a regional aggregation and customize the legend labels:

gtaptools::plot_map(input_data = gtaptools::template_map,
                    value_var = "gdp_pc_ave", 
1                    region_var = "subregion",
                    colors = "RdBu",
2                    legend_labels = c("teste", "teste3", "teste4"),
                    legend_title = 'GDP per capita 2021, PPP</br>(constant 2017 international $)',
3                    legend_pos = "bottomright",
                    reactive = T
)
1
The variables region_var and value_var are not unique for each country; therefore, a regional aggregation will be performed before plotting the map.
2
Custom legend labels.
3
The position of the legend has also been changed.
gtaptools::plot_map(
  input_data = gtaptools::template_map,
  value_var = "gdp_pc_ave",
  region_var = "subregion",
  colors = "RdBu",
1  legend_labels = c("label1", "label2", "label3"),
  legend_title = "GDP pc 2021\n(con. 2017 int. $)",
2  legend_pos = "bottom",
  reactive = F
)
1
For static maps, the number of elements must equal those automatically generated by ggplot2.
2
Note that the options for legend_pos for static maps are different from reactive maps.

There must not be different numerical values (value_var) for the same region (region_var). Otherwise, the regional aggregation will not be complete.

Finally, let’s customize some color elements.

gtaptools::plot_map(
  input_data = gtaptools::template_map,
  value_var = "gdp_pc",
  region_var = "name",
1  colors = c("red" = 0, "grey" = 15000, "blue" = 70000, "black" = 120000),
2  fillOpacity = .7,
3  borders_color = "black",
  borders_size = 2,
  legend_title = "GDP per capita 2021, PPP</br>(constant 2017 international $)"
)
1
Here we set a custom cut for the color palette, where each color has its full hue when it reaches the specified numerical value and gradually changes its shade from one color to another in the interval between the indicated values.
2
Color transparency has been changed (only for reactive maps).
3
We defined a borders_color of the regions to black with 50% transparency and their thickness to 2 in borders_size.
gtaptools::plot_map(
  input_data = gtaptools::template_map,
  value_var = "gdp_pc",
  region_var = "name",
  colors = c("red" = 0, "grey" = 15000, "blue" = 70000, "black" = 120000), 
  borders_color = "black", 
  borders_size = 1,
  legend_title = "GDP pc 2021\n(con. 2017 int. $)",
  reactive = F
)

Tip

When manually defining color clippings in colors, it can be helpful to know better the distribution of value_var. For this, we can plot a simple histogram.

Non-spatial data

- gtaptools::plot_bars

plot_bars plots bar graphs using ggplot2 and plotly R packages. This tool can be used for creating static and reactive bar charts. input_data is the data frame or array containing the data that will be plotted. x and y are the names of the variables to be plotted on the x and y axes, respectively. x_label and y_label are the labels for the x and y axes, respectively. fill is the variable to be used for stacking the bars. facet is the variable to be used for faceting the graph. palette is the name or number of the color palette to be used. orientation_bars is the orientation of the bars to be plotted, which can be either “horizontal” or “vertical”. rotate_x_labels is the angle in degrees to rotate the x-axis labels. legend_title is the title to be used for the legend, and legend_pos is the position to be used for the legend. gtap_theme is the template theme to be used.

What this tool does:

Check out some usage examples:

path_to_har <- gtaptools::templates("gtap_example.har")
input_data <- gtaptools::har_shape(path_to_har)


gtaptools::plot_bars(
1  input_data = input_data$EVFB,
2  x = "REG",
  x_label = "Regions",
3  y = "Freq",
  y_label = "Primary factor purshases (US$)",
4  fill = "ENDW",
5  facet = "ACTS",
  legend_title = "Endowments"
)
1
A data.frame or array.
2
The column of values will compose the x-axis and its label, respectively.
3
The column of values will compose the y-axis and its label, respectively.
4
The categorical variable will fill the colors of the bars by category.
5
The categorical variable to define the multiple frames/facets.
path_to_har <- gtaptools::templates("gtap_example.har")
input_data <- gtaptools::har_shape(path_to_har)


gtaptools::plot_bars(
  input_data = input_data$EVFB,
  x = "REG",
  x_label = "Regions",
  y = "Freq",
  y_label = "Primary factor purshases (US$)",
  fill = "ENDW",
  facet = "ACTS",
  legend_title = "Endowments",
  reactive = F
)

Report automation

Applying the tools to

GTAP

SIMPLE/SIMPLE-G

TERM

Acknowledgements

References